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Abstract 

Background: Herbal medicine has long been viewed as a valuable asset for potential new drug discovery and 
herbal ingredients' metabolites, especially the in vivo metabolites were often found to gain better pharmacological, 
pharmacokinetic and even better safety profiles compared to their parent compounds. However, these herbal 
metabolite information is still scattered and waiting to be collected. 

Description: HIM database manually collected so far the most comprehensive available in-vivo metabolism 
information for herbal active ingredients, as well as their corresponding bioactivity, organs and/or tissues 
distribution, toxicity, ADME and the clinical research profile. Currently HIM contains 361 ingredients and 1 104 
corresponding in-vivo metabolites from 673 reputable herbs. Tools of structural similarity, substructure search and 
Lipinski's Rule of Five are also provided. Various links were made to PubChem, PubMed, TCM-ID (Traditional Chinese 
Medicine Information database) and HIT (Herbal ingredients' targets databases). 

Conclusions: A curated database HIM is set up for the in vivo metabolites information of the active ingredients for 
Chinese herbs, together with their corresponding bioactivity, toxicity and ADME profile. HIM is freely accessible to 
academic researchers at http://www.bioinformatics.org.cn/. 
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Background 

As one of the naturally originated medical systems, Chinese 
herbal medicine (CHM) has developed for several thou- 
sand years and accumulated plenty of clinical experiences 
and pharmacological information to form its own inte- 
grated theory system [1]. Being a multi-component and 
multi-target therapy methodology, the studies on its mo- 
lecular mechanism have made a great progress in recent 
years although much more still remains unclear [2-4]. In 
order to get a deeper insight into the mechanism of CHM, 
various modern scientific technologies have been applied 
to separate and purify the active ingredients from herbs 
and elucidate their pharmacodynamic characteristics [5]. 
Over the past few years, many active compounds have 
been separated and their pharmacological effects were 
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tested [6-8]. More interestingly, during many researches 
[9-11], active metabolites were sometimes found to gain 
better pharmacological, pharmacokinetic and safety pro- 
files compared to their respective parent compounds. For 
example, morphine is a widely used analgesic which was 
extracted from Papaver somniferum L and its major thera- 
peutic benefit is mediated by morphine-6-glucuronide, an 
active metabolite of morphine [12,13]. Another example, 
Ginsenoside-Rbl, a major active ingredient of Panax gin- 
seng, is found to have the antiallergic activity through its 
main metabolite named compound K instead of itself [14]. 
Similarly, glycyrrhizic acid, an alicyclic compound which 
was extracted from Glycyrrhiz glabra L, has no effect of 
anti-lipid peroxidation in rat hepatocyte, while its metab- 
olite, glycyrrhetinic acid, has the inhibitory effect on lipid 
peroxidatioin in dose-dependent manner [15]. Many other 
similar instances can be found which implies an important 
message that metabolites of herbal ingredients could be 
highly valuable for new drug discovery. 

Currently, abundant metabolism information of herbal 
active ingredients has been produced with the progress 
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of TCM modernization. However, although there exist 
several databases such as MDL Metabolite Database [16] 
and Accelrys Metabolism Database [17] for synthetic 
compounds whose pharmacokinetic and metabolism 
data have been carefully stored, there is still lack of 
specific database to collect and store the correspond- 
ing information for herbal active ingredients. It is 
noted that synthetic compounds metabolic databases 
have made great contributions to new drug discovery 
[18]. Constructing a database collected the CHM in- 
gredients metabolism information could also have sub- 
stantial positive impact on TCM development. 

In our previous work, in order to collect available re- 
sources of protein targets for FDA-approved drugs and 
the promising precursors, we developed HIT [19] 
(http://lifecenter.sgst.cn/hit/). Served as a serial work 
after HIT, HIM is a database which aims to provide the 
systematical and accurate data storage, data access as 
well as data analysis (i.e., structural similarity search and 
substructure analysis) for the herbal active ingredients 
in vivo metabolism information. In this work, the in vivo 
metabolism data of those active ingredients extracted 
from herbs were collected from literature, unpublished 
in-house experimental data and the Chinese herbal med- 
ical monographs. The information from all these hetero- 
geneous data sources was further processed and 
integrated into a well-designed database. The informa- 
tion of each ingredient was divided into three categories: 
identification label, metabolic scheme and bioactivity in- 
formation. Additionally, properties like the number of 
hydrogen bond (H-bond) donors and acceptors, molecu- 
lar weight or the octanol-water partition coefficient logP, 
which allow the evaluation of the Lipinski s Rule of Five, 
can be found within the database. The 2D structures of 
all the compounds are available and the structure simi- 
larity search function and substructure search function 
are also provided. In summary, up to now, there are 361 
active ingredients from 673 Chinese herbs and 1104 cor- 
responding metabolites stored in HIM. All the data were 
freely accessible at http://www.bioinformatics.org.cn/ for 
academic researches. 

Construction and content 

Data source 

The data in HIM were compiled from both primary and 
secondary sources. 

First, Chinese herbal ingredients in vivo metabolism 
data were extracted from PubMed [20] literature by 
searching with key words: metabolism, metabolite, bio- 
transformation, metabolic, CHM, Chinese herbal medi- 
cine, in vivo. Then a preliminary screening was carried 
out by browsing all the abstracts manually. After that we 
checked the full text for all the qualified articles and 
extracted the information according the database 



criterion. At last, the data is confirmed when they passed 
the quality control process which consists of rechecking 
and revising. In summary, about one-third of all entries 
come from literatures. 

Second, metabolism data were extracted from the 
book entitled "Absorption, Distribution, Metabolism, 
Excretion, Toxicity and Activity of The Chemical Con- 
stituents in Traditional Chinese Medicines" [21]. This 
book is a well-known TCM monograph which is 
concerning ADME/T (Absorption, Distribution, Me- 
tabolism, Excretion and Toxicity) of CHM active ingre- 
dients in China. Approximately half of the entries in 
the database are derived from this book, which made 
such valuable information available online for the first 
time. 

Third, some unpublished in vivo experimental data 
about CHM ingredients metabolism are also gathered in 
HIM, which accounts for the remaining minority of the 
entries. 

Content and details 

The database HIM comprises three data fields for each 
active ingredient: identification label, metabolic scheme 
and bioactivity information. 

Identification label 

In this field, following information is provided for each 
record: 

Common, Alias and Systematic Names. Both the Chinese 
pinyin and the common English name are provided. The 
aliases of each active ingredient that are obtained from the 
database SciFinder [22] are also listed. The systematic name 
presents precise details of a chemical structure which is 
generated based on the IUPAC names of natural product 
skeletal types. 

CAS Registry Number. CAS number provides a reli- 
able link between different systems of nomenclature as 
well as an access to future information on every ingredi- 
ent. In addition, the compounds are also annotated with 
the CID numbers provided by PubChem [23] with a 
hyperlink to it. 

Botanical Species. Latin binomials of the herbs and the 
corresponding region of the plant in which the ingredi- 
ent located are listed. 

Metabolic scheme 

In this field, detailed information about in vivo me- 
tabolism data of each CHM active ingredient is 
available. 

Molecular Weight and Molecular Formula. The MW 
and MF of all compounds are given by using the func- 
tions provided by Marvin Bean [24], a JAVA package 
supplied by ChemAxon [25]. 
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Figure 1 Primary pages in HIM. (A) Keywords Text Search (B) Draw a molecular structure for structure search (C) Upload a MOL/SDF file 
for structure search (D) Result page of Text Search' with 'Name: Shikonin'. (E) Result page of 'Structure Similarity Search' with the structure 
of the compound: Shikonin. (F) Detail Information of the compound: Shikonin (G) Lipinski's Rule of Five properties for one metabolite 
of Shikonin. 



Structure. For each CHM active ingredient and its me- Metabolite and Metabolic Scheme. All the metabolites 
tabolites, a 2D structure which is stored in MDL mol for- of each active ingredient and the full view of in vivo 
mat in HIM is shown in JPEG format on the web page. metabolic process are provided in HIM. 
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Bioactivity information 

In this field, extensive information about the pharmaco- 
kinetic (ADME) properties, bioactivity and toxicity of 
each active ingredient are listed. 

Bioactivity 

Some general concepts like anti-cancer, anti-inflammation, 
anti-bacteria, etc. rather than the diseases-related protein 
targets are used to describe the bioactivity of each active 
ingredient. 



molecule editor applet, Marvin Sketch [24]. The struc- 
ture similarity search is performed by using the so-called 
structural fingerprint, a binary string with a length of 
1024 bits which has encoded the structure characteris- 
tics of a given compound. Note that the fingerprint is 
generated by the Chemistry Development Kit (CDK) 
[26]. Then the Tanimoto coefficient is calculated by the 
background program. A molecule with a Tanimoto coef- 
ficient > 0.85 to an active compound is often assumed to 
own similar biological activity [27]. 



Toxicity 

As being generally recognized, LD50 (median lethal 
dose) value is used to represent the toxicity with some 
concrete descriptions for each ingredient. 

Other Information. Besides the bioactivity and toxicity, 
some other information such as absorption, distribution, 
clinical research and the main references are also avail- 
able (see Figure 1). 

Additional functions 
Text search 

The "Text Search" function in the homepage of the web- 
site provides five distinctive items to search the whole 
database: Compound name, CAS Number, CID Number, 
Molecular Formula and Keywords. 

Structure similarity search 

The "Structure Similarity Search" can be done by 
uploading a compound structure in MOL/SDF format or 
via drawing the structure as you want with an embedded 



Substructure search 

Chemical substructure-based in silico techniques have 
been wildly used as an effective and popular approach to 
reduce the cost in identifying molecules suitable for 
pharmaceutical development in early stage of drug dis- 
covery [28,29]. In our database HIM, substructure search 
is also available by JChem [30] . 

Website and server 

HIM is available online at: http://www.bioinformatics. 
org.cn/. It is designed as a relational database and 
implemented in MySQL Server 5.0 with the Apache 
Tomcat 6.0 as the web server. For chemical calculation 
and structure drawing, CDK package and Marvin Sketch 
applet are embedded. The website is built in JSP, HTML 
and CSS. 

Utility 

HIM (http://www.bioinformatics.org.cn/), which is pro- 
posed in this work, is served as a serial work after HIT 
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and concerns about the herbal active ingredients with 
explicit in vivo metabolism data, since the active metab- 
olites of CHM were sometimes found to gain better 
pharmacological, pharmacokinetic and safety profiles 
compared to their respective parent compounds. With 
the help of HIM, researches could find out the mechan- 
ism of pharmacological action of CHM more compre- 
hensively, which is expected to have a substantial 
positive impact on CHM development. 

Discussion 

Although having been used for thousands of years and 
own outstanding reputation, the mechanism of CHM is 
still largely unknown. One reason contributing to this is 
that it is unclear for the process of ADME/T in vivo 
such as in vivo metabolism of CHM. HIM is constructed 
as the first database to store almost all the CHM active 
ingredients in vivo metabolism data dated to January of 
2012, as well as their corresponding bioactivity, toxicity, 
and ADME profile. The properties of Lipinskis rule of 
five for each compound are also given to the whole data- 
base. As one of the common rules, Lipinski s rule of five 
is widely used in drug screening and design. Blake's [31] 
study has shown that for the five stages from pre-clinical 
to approved, less and less compounds break the 
Lipinskis rules. It is indicated that compounds which 
are against the Lipinskis rule need to be modified too 
much and they are little probability to be a drug. In 
HIM about 90% of the compounds (herbal ingredients 
and their metabolites) meet the Lipinskis rule (Figure 2). 
We hope that HIM can be served as a valuable database 
to make the progress of CHM modernization and pro- 
vide great assistance in the new drug discovery and 
developments. 

Conclusion 

HIM can be used to get the metabolites of the active in- 
gredient which the researchers are interested in. The 
structure similarity search and substructure search can 
be applied to get compounds which potentially similar 
bioactivity to the query compound and can provide 
other chemical and biologic information of the query 
molecule. Moreover, the database is useful for the study 
of pharmacognosy. Although some active metabolic 
intermediates are unstable and hard to get, fermenta- 
tion technology such as microbial transformation 
could be used to obtain the active compounds which 
are the in vivo metabolites of CHM. Crude herbal 
medicines could be fermented by certain microbial 
strains and get certain products [32,33]. HIM could 
provide valuable information for the researchers who 
are interested in TCM, drug design, pharmacognosy, 
drug metabolism, etc. 
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